140 research outputs found
Machine learning enabled query re-optimization algorithms for cloud database systems
In cloud database systems, hardware configurations, data usage, and workload allocations are continuously changing. These changes make it difficult for the query optimizer to obtain an optimal query execution plan (QEP) for a query based on the data statistics collected before the query execution. In order to optimize a query with a more accurate cost estimation to achieve such a QEP, performing query re-optimizations during the query execution has been proposed in the literature. However, some of the re-optimizations may not provide any gain in terms of query response time or monetary cost and may also have negative impacts on the query performance due to their overheads. This raises the question of how to determine when a re-optimization is beneficial. In addition, a Service Level Agreement (SLA) is signed between users and the cloud. Thus, query re-optimization is multi-objective optimization that minimizes not only query execution time and monetary cost but also SLA violation. However, none of the existing query re-optimization algorithms considers all these three objectives together and none of them can predict when a re-optimization is beneficial.
To fill the gap, in this dissertation, four novel query re-optimization algorithms, ReOpt, ReOptML, ReOptRL and SLAReOptRL are proposed. Extensive theoretical and experimental evaluations performed on our proposed techniques showed that each of them has better performance in terms of time, monetary cost, and SLA violation rate than state-of-the-art techniques when applied to the TPC-H database benchmark
Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and MLPs
Graph neural networks (GNNs), as the de-facto model class for representation
learning on graphs, are built upon the multi-layer perceptrons (MLP)
architecture with additional message passing layers to allow features to flow
across nodes. While conventional wisdom commonly attributes the success of GNNs
to their advanced expressivity, we conjecture that this is not the main cause
of GNNs' superiority in node-level prediction tasks. This paper pinpoints the
major source of GNNs' performance gain to their intrinsic generalization
capability, by introducing an intermediate model class dubbed as
P(ropagational)MLP, which is identical to standard MLP in training, but then
adopts GNN's architecture in testing. Intriguingly, we observe that PMLPs
consistently perform on par with (or even exceed) their GNN counterparts, while
being much more efficient in training. This finding sheds new insights into
understanding the learning behavior of GNNs, and can be used as an analytic
tool for dissecting various GNN-related research problems. As an initial step
to analyze the inherent generalizability of GNNs, we show the essential
difference between MLP and PMLP at infinite-width limit lies in the NTK feature
map in the post-training stage. Moreover, by examining their extrapolation
behavior, we find that though many GNNs and their PMLP counterparts cannot
extrapolate non-linear functions for extremely out-of-distribution samples,
they have greater potential to generalize to testing samples near the training
data range as natural advantages of GNN architectures.Comment: Accepted to ICLR 2023. Codes in https://github.com/chr26195/PML
A Vision of a Decisional Model for Re-optimizing Query Execution Plans Based on Machine Learning Techniques
International audienceMany of the existing cloud database query optimization algorithms target reducing the monetary cost paid to cloud service providers in addition to query response time. These query optimization algorithms rely on an accurate cost estimation so that the optimal query execution plan (QEP) is selected. The cloud environment is dynamic, meaning the hardware configuration, data usage, and workload allocations are continuously changing. These dynamic changes make an accurate query cost estimation difficult to obtain. Concurrently, the query execution plan must be adjusted automatically to address these changes. In order to optimize the QEP with a more accurate cost estimation, the query needs to be optimized multiple times during execution. On top of this, the most updated estimation should be used for each optimization. However, issues arise when deciding to pause the execution for minimum overhead. In this paper, we present our vision of a method that uses machine learning techniques to predict the best timings for optimization during execution
Ion Exchange Membranes for Electrodialysis: A Comprehensive Review of Recent Advances
Electrodialysis related processes are effectively applied in desalination of sea and brackish water, waste water treatment, chemical process industry, and food and pharmaceutical industry. In this process, fundamental component is the ion exchange membrane (IEM), which allows the selective transport of ions. The evolvement of an IEM not only makes the process cleaner and energy-efficient but also recovers useful effluents that are now going to wastes. However ion-exchange membranes with better selectivity, less electrical resistance, good chemical, mechanical and thermal stability are appropriate for these processes. For the development of new IEMs, a lot of tactics have been applied in the last two decades. The intention of this paper is to briefly review synthetic aspects in the development of new ion-exchange membranes and their applications for electrodialysis related processes
SLA-Aware Cloud Query Processing with Reinforcement Learning-based Multi-Objective Re-Optimization
International audienceQuery processing on cloud database systems is a challenging problem due to the dynamic cloud environment. In cloud database systems, besides query execution time, users also consider the monetary cost to be paid to the cloud provider for executing queries. Moreover, a Service Level Agreement (SLA) is signed between users and cloud providers before any service is provided. Thus, from the profit-oriented perspective for the cloud providers, query re-optimization is multi-objective optimization that minimizes not only query execution time and monetary cost but also SLA violations. In this paper, we introduce ReOptRL and SLAReOptRL, two novel query re-optimization algorithms based on deep reinforcement learning. Experiments show that both algorithms improve query execution time and query execution monetary cost by 50% over existing algorithms, and SLAReOptRL has the lowest SLA violation rate among all the algorithms
Recommended from our members
Diagnosis and Prognosis Using Machine Learning Trained on Brain Morphometry and White Matter Connectomes
Accurate, reliable prediction of risk for Alzheimer’s disease (AD) is essential for early, diseasemodifying
therapeutics. Multimodal MRI, such as structural and diffusion MRI, is likely to contain
complementary information of neurodegenerative processes in AD. Here we tested the utility of
commonly available multimodal MRI (T1-weighted structure and diffusion MRI), combined with
high-throughput brain phenotyping—morphometry and connectomics—and machine learning,
as a diagnostic tool for AD. We used, firstly, a clinical cohort at a dementia clinic (study 1: Ilsan
Dementia Cohort; N=211; 110 AD, 64 mild cognitive impairment [MCI], and 37 subjective
memory complaints [SMC]) to test and validate the diagnostic models; and, secondly,
Alzheimer’s Disease Neuroimaging Initiative (ADNI)-2 (study 2) to test the generalizability of the
approach and the prognostic models with longitudinal follow up data. Our machine learning
models trained on the morphometric and connectome estimates (number of features=34,646)
showed optimal classification accuracy (AD/SMC: 97% accuracy, MCI/SMC: 83% accuracy;
AD/MCI: 97% accuracy) with iterative nested cross-validation in a single-site study,
outperforming the benchmark model (FLAIR-based white matter hyperintensity volumes). In a
generalizability study using ADNI-2, the combined connectome and morphometry model
showed similar or superior accuracies (AD/HC: 96%; MCI/HC: 70%; AD/MCI: 75% accuracy) as
CSF biomarker model (t-tau, p-tau, and Amyloid β, and ratios). We also predicted MCI to AD
progression with 69% accuracy, compared with the 70% accuracy using CSF biomarker model.
The optimal classification accuracy in a single-site dataset and the reproduced results in multisite
dataset show the feasibility of the high-throughput imaging analysis of multimodal MRI and
data-driven machine learning for predictive modeling in AD
A Scored Semantic Cache Replacement Strategy for Mobile Cloud Database Systems
International audienceCurrent mobile cloud database systems are widespread and require special considerations for mobile devices. Although many systems rely on numerous metrics for use and optimization, few systems leverage metrics for decisional cache replacement on the mobile device. In this paper we introduce the Lowest Scored Replacement (LSR) policy-a novel cache replacement policy based on a predefined score which leverages contextual mobile data and user preferences for decisional replacement. We show an implementation of the policy using our previously proposed MOCCAD-Cache as our decisional semantic cache and our Normalized Weighted Sum Algorithm (NWSA) as a score basis. Our score normalization is based on the factors of query response time, energy spent on mobile device, and monetary cost to be paid to a cloud provider. We then demonstrate a relevant scenario for LSR, where it excels in comparison to the Least Recently Used (LRU) and Least Frequently Used (LFU) cache replacement policies
- …